Deep Learning of ! Human Motion!
نویسندگان
چکیده
Complex human actions are modelled as graphs over basic attributes and their spatial and temporal relationships. The attribute dictionary as well as the graphical structure are learned automatically from training data. y(x,w) = D X i=0 wixi 1 The activity model A set C = {ci, c2, . . . , cC} of C complex activities is considered, where each activity ci is modelled as a Graph Gi = (Vi, Ei), and both, nodes and edges of the graphs, are valued. For convenience, in the following we will drop the index i and consider a single graph for a given activity. The set of nodes V of the graph G is indexed (here denoted by j) and corresponds to occurrences of basic attributes. Each node j is assigned a vector of 4 values {aj, xj, yj, tj} : the attribute type aj and a triplet of spatio-temporal coordinates xj, yj and tj. The edges define pairwise logical spatial or temporal relationships: before, after, overlaps, is included, near, . . .). Examples of graphs for the three activities Leaving a baggage unattended, Telephone conversation, and Handshaking between two people are shown in figure 1. Note, that one node of a model graph can be matched with several consecutive attribute occurrences in the test video. For instance, when the model Telephone conversation (figure 1b) is matched, the node Person will in general be matched to multiple occurrences of a person in the video — as long as the conversation will last. Also, the graphs shown in the figure are only examples, the actual graphs are learned automatically and will in general not correspond to a graph designed by a human. The attribute type variables aj = k may take values k in an alphabet ⇤ = {1, . . . , L}. These values can correspond to fixed (manually designed types), as for instance Person, or automatically learned attributes. Associated to each possible type k is a feature function k(v, x, y, t;⇥k) ! {0, 1} which evaluates whether in the spatio-temporal block centered on (x, y, t) of the video v the attribute is found (= 1) or not (= 0). The parameters (to be learned) of these functions are denoted by ⇥k. Each edge ejk between two nodes j and k is assigned an edge label which may take values in an alphabet ⌥ (before, after, overlaps, is included, near, . . .).). There may be multiple edges between the same pair of nodes. 1 11! Multi-layer Perceptron (MLP)! « Fully-connected » layers! 5.1. Feed-forward Network Functions 229 notation for the two kinds of model. We shall see later how to give a probabilistic interpretation to a neural network. As discussed in Section 3.1, the bias parameters in (5.2) can be absorbed into the set of weight parameters by defining an additional input variable x0 whose value is clamped at x0 = 1, so that (5.2) takes the form aj = D ∑ i=0 w ji xi. (5.8) We can similarly absorb the second-layer biases into the second-layer weights, so that the overall network function becomes yk(x,w) = σ ( M ∑
منابع مشابه
RGB-D-based Human Motion Recognition with Deep Learning: A Survey
Human motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional neural networks (CNN) have achieved great success...
متن کاملAdaptive Filtering Strategy to Remove Noise from ECG Signals Using Wavelet Transform and Deep Learning
Introduction: Electrocardiogram (ECG) is a method to measure the electrical activity of the heart which is performed by placing electrodes on the surface of the body. Physicians use observation tools to detect and diagnose heart diseases, the same is performed on ECG signals by cardiologists. In particular, heart diseases are recognized by examining the graphic representation of heart signals w...
متن کاملA Deep Learning Approach for Motion Retar- Geting
Motion retargeting is a process of copying the motion from one (source) to another (target) character when those body sizes and proportion (e.g, arms, legs, torso, and so on) are different. One of the simplest ways to retarget the human motion is manually modifying its joint angles one at a time, however, it would be a difficult and tedious task to get it done for whole the joints of the given ...
متن کاملLearning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks
Linking human whole-body motion and natural language is of great interest for the generation of semantic representations of observed human behaviors as well as for the generation of robot behaviors based on natural language input. While there has been a large body of research in this area, most approaches that exist today require a symbolic representation of motions (e.g. in the form of motion ...
متن کاملCross-Country Skiing Gears Classification using Deep Learning
Human Activity Recognition has witnessed a significant progress in the last decade. Although a great deal of work in this field goes in recognizing normal human activities, few studies focused on identifying motion in sports. Recognizing human movements in different sports has high impact on understanding the different styles of humans in the play and on improving their performance. As deep lea...
متن کاملFace Expression Recognition Based on Motion Templates and 4-layer Deep Learning Neural Network
A human facial expression is the formation of facial muscle movement. In our previous research, we proposed a method of identifying facial muscle movement which based on motion templates and GentleBoost. But the method was not robust enough to recognize human expression due to insufficient learning stage. So in this paper, we proposed a new method based on motion templates and 4-layer deep lear...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016